Development of a novel mycobiome diagnostic for fungal infection

Background Amplicon-based mycobiome analysis has the potential to identify all fungal species within a sample and hence could provide a valuable diagnostic assay for use in clinical mycology settings. In the last decade, the mycobiome has been increasingly characterised by targeting the internal transcribed spacer (ITS) regions. Although ITS targets give broad coverage and high sensitivity, they fail to provide accurate quantitation as the copy number of ITS regions in fungal genomes is highly variable even within species. To address these issues, this study aimed to develop a novel NGS fungal diagnostic assay using an alternative amplicon target. Methods Novel universal primers were designed to amplify a highly diverse single copy and uniformly sized DNA target (Tef1) to enable mycobiome analysis on the Illumina iSeq100 which is a low cost, small footprint and simple to use next-generation sequencing platform. To enable automated analysis and rapid results, a streamlined bioinformatics workflow and sequence database were also developed. Sequencing of mock fungal communities was performed to compare the Tef1 assay and established ITS1-based method. The assay was further evaluated using clinical respiratory samples and the feasibility of using internal spike-in quantitative controls was assessed. Results The Tef1 assay successfully identified and quantified Aspergillus, Penicillium, Candida, Cryptococcus, Rhizopus, Fusarium and Lomentospora species from mock communities. The Tef1 assay was also capable of differentiating closely related species such as A. fumigatus and A. fischeri. In addition, it outperformed ITS1 at identifying A. fumigatus and other filamentous pathogens in mixed fungal communities (in the presence or absence of background human DNA). The assay could detect as few as 2 haploid genome equivalents of A. fumigatus from clinical respiratory samples. Lastly, spike-in controls were demonstrated to enable semi-quantitation of A. fumigatus load in clinical respiratory samples using sequencing data. Conclusions This study has developed and tested a novel metabarcoding target and found the assay outperforms ITS1 at identifying clinically relevant filamentous fungi. The assay is a promising diagnostic candidate that could provide affordable NGS analysis to clinical mycology laboratories. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-024-03197-5.

) and TEF (D-F).In a number of these samples, both targets identified sequences from species which were not present in the sample (false positives).By tracking results from other sequencing runs it was revealed that cross-contamination of barcoded primer sets accounted for these false positives in the TEF assay (D,E) whereas the ITS1 assay could not discriminate between P. rubens and P. chrysogenum (B) however.

Figure S2 .
Figure S2. in silico PCR performance of ITS1 and TEF targets.PrimerTree was used to search primers against NCBI non redundant nucleotide database.Unrooted tree indicates taxonomic kingdom of identified sequences for ITS1 (left) and TEF1α.B. PrimerTree searches were repeated against fungal sequences only.Radial trees indicate the fungal phyla of hits for ITS1 (top) and TEF1α.

Figure S3 .
Figure S3.Aspergillus speciation of ITS1 and TEF1α amplicons.TEF1α (top) and ITS1 (bottom) amplicon sequences (without primer regions) were obtained for 131 Aspergillus species.Alignments were performed using Clustal.Phylogenetic trees were generated using the Maximum Likelihood method and Tamura-Nei model, with 500 bootstraps.Alignments and trees were created in MEGA X. Tree images were generated using iTOL.Coloured boxes indicated examples of species complexes (Aspergillus fumigatus in red, Aspergillus flavus in grey and Aspergillus niger in blue).

Figure S5 .Figure S6 .
Figure S5.Corresponding raw count data for mock community results in figure 3. Two representative fungal mock community analyses are shown.Each community contained 5 species, was targeted by TEF (A) and ITS1 (B) in duplicate and sequenced on an Illumina iSeq and MiSeq.For ITS1 samples, iSeq sequencing was performed twice.Bar plots indicate raw species count data and expected mock community (dashed green box -arbitrarily set to 100,000 total counts).

Figure S7 .
Figure S7.Single species analyses.Bar plots indicate species proportions above a 0.2% cutoff.Aspergillus fumigatus (A), Candida albicans (B) and Cryptococcus neoformans (C) were each targeted by ITS1 and three barcoded TEF primer sets (BC1-3).BC1 primer sets were previously used whereas BC2-3 primer sets contained novel barcode sequences and were previously unused solutions.

Figure S8 .Figure S9 .AFigure S10 .
Figure S8.TEF species detection in mock fungal communities is not significantly hindered by bacterial gDNA background.A. Boxplot of percent abundance for species spiked at 2% within mock communities targeted by TEF.Expected percent abundance is indicated by dashed black line.Dotted red line indicates no identification.Percent abundances are grouped by bacterial DNA background status ( ~2000-fold of an E. coli and P. aeruginosa 50:50 mix).Species level quantifications were significantly different when with and without human background for C. neoformans (*; P < 0.05 or ***; P < 0.0001 by Wilcoxon rank sum test).For all other species which were identified in a sample, quantifications did not differ significantly (P < 0.05 by Wilcoxon rank sum test).Wilcoxon rank sum tests were not performed for species with less than 3 data points per human background status and are indicated with 'ND'.B. Heatmap representation of TEF identification (ID) rates for species when spiked at 2% within mock communities depending on human DNA background status.Human DNA resulted in no change in detection for all species tested.

Figure S11 .
Figure S11.Raw gel images for figure S1.A Raw, uncropped gel images of gel images included in figure S1C.One complete gel (left) was cropped, where one gel (right) was cropped to only include the samples indicated by the red box.Lanes 9, 10 and 11 of this gel (right) contain samples not described in this paper.B. Raw, uncropped gel image for the gel image included in figure S1D.The section of gel included in figure S1D are indicated within the red box.All other samples are not described in this paper.C. Original gel image for cropped gel image in figure S1E.The section of gel included in figure S1E are indicated within the red box.All other samples are not described in this paper.The full gel image for this gel is not available.The original scan file from the Bio-Rad Gel Doc instrument is available for this image in the supplemental data.